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(54) CLUSTERING DEVICE 

(57)Abstract 

PROBLEM TO BE SOLVED: To provide a clustering device 
capable of coping with the dynamic change of data in 
clustering in a simple constitution and procedure. 
SOLUTION: This clustering device for classifying input data by 
using a cluster is provided with a cluster preparing device 1 for 
preparing a cluster, a clustering performing device 2 for 
performing the clustering of the input data by using the cluster 
prepared by the cluster preparing device, a clustering result 
monitoring device 3 for monitoring the clustering result of the 
clustering performing device, and for identifying the 
erroneously classified input data, and a storage means 8 for 
storing the erroneously classified input data. When the fixed 
number of data or more are stored in the storage means, a 
new cluster is prepared by the cluster preparing device based 
on the data. Thus, it is possible to correct the cluster 
corresponding to the dynamic change of the input data, and to 
reduce the erroneous classification. 
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[Claim 4]The clustering device comprising according to claim !■ 

a-^^^ P ~ -vice considers prototype data 

A cluster formation means to classify said generated self-organization map and to form a cluster. 

[Claim 5]The clustering device according to claim 4 which is provided with the following and 
charactenzed by adding a cluster formation means of said cluster preparation dev cTt ' cluster which 
class,fied the self-organization map, created a cluster, and was already created when sa d 
self-organ.zat.on map correcting means generates a self-organization map 

LtTXT m0n "° r meanS ^ WWCh S3id ClUStenng reSUlt m ° nit ^ **™«* supervises a 

tlat s Zn an,Z Tl™ P , r rreCting meanS Whi0h generates a ^'^organization map by considering said 
data as an mput when data more than fixed numbers is stored in said accumulation means 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention]This invention enables it to correspond to the dynamic change of input data 
appropriately especially about the clustering device which classifies much data into a class from the 
similarity. 
[0002] 

[Description of the Prior ArtJConventionally, various things are proposed as the clustering technique 
I be example of the most common clustering device is shown in drawing 6 . 

[0003]100 is shown and here the prototype data constellation for study 102 and 103 The bottom 
indicates the cluster A and the cluster B to be initial clusters for eaoh data of a prototype data 
constellation wholly, 104 shows the distance of the cluster A102 and the cluster B103. and 105 shows 
the cluster C which unified the cluster A102 and the cluster B103. 200 shows the cluster result 
created from prototype data, and 201 and 202 show the cluster Y created eventually and the cluster Z 
300 shows the clustering device which used the cluster, and the cluster Y with 301 completely of the 
same type as 201 and the cluster Z with 302 completely of the same type as 202 are shown and 303 
The input X which is an object of clustering is shown and 304 shows the point of the input X303 of Jo 
Sorama in whom a cluster exists. 

[0004]In this device, a cluster required for a clustering device is created first. This is called for by the 
following work. 

[0005]Supposing it looks for a cluster with the nearest distance and the cluster A102 and the cluster 
B103 are chosen from the prototype data constellation 100 for study as a result, these two will be 
umfied, it will be considered as the cluster CI 05, and the clusters A and B will delete At this time the 
cluster CI 05 has a value of the cluster A1 02 and the cluster B103 both. Next, a series of work of ' 
looking for a cluster with the nearest distance and unifying them from the prototype data constellation 
100 similarly is repeated. Work is ended, when the total cluster number was set to 1 at this time or 
when the distance of clusters with the nearest distance is larger than a certain constant value 
[0006JBy this work of a series of, the cluster result 200 created from prototype data is searched for 
and the cluster integrated eventually turns into the cluster Y201 and the cluster Z202 
[0007]The clustering device 300 using a cluster performs actual clustering using these clusters 
integrated eventually. When the input X303 is inputted into the clustering device 300 using this cluster 
and the input X303 is included in the cluster Y301, the input X303 brings the result of having been 
clustered by the cluster Y301 . 

[0008]clustering — a self-organization map (SOM:Self-Oraganization Map — in detail) T Kohonen It is 
indicated to Sel^Oraganization and Associative Memory", ThirdEdition, Springer-Verlag, Berlin and 
1989. The neural network called. The technique to be used is also known (JP.7-234853 A) Prototype 
data is inputted into SOM, the neurone which forms SOM is learned and the learned neurone is 
class.fied into a cluster according to this method. If input data is given to SOM after a cluster is 
[0009] d ' neUr ° ne With the Va,Ue near the input Wl " be d€ *ermined, and input data will be clustered. 
[Problem(s) to be Solved by the Invention]However, in the above clustering techniques, since the 



cluster » formed u «ng prototype data, the cluster which inclined only toward prototype data is formed 
Therefore when live data are actually clustered using these clusters, there is a problem 2£dT- 
be.ng unable to respond to a dynamic change of input data *° 35 

[0010]That is, when the data which should belong to a new class arises with the oassaee of tlm. in * 
oTc™ m6th0d ' C ° rreSPOnde - e Wi " be « a" and false ^1^2 SVoST 

[001 1]ln order -to prevent these false sorts, it is necessary to recluster using all the data and is forced 
th Z ^ f ' n COnventic,na ' ™ thod «ho including prototype data. When dataTs newly added 
the me hod of correcting a cluster is indicated by JP.5-205058.A. but It must be know to have added 
new date and th.s needs to tell making correction of the cluster by addition of data from £e erfenW 

toOWmX:, T ^ C h °" eCted aUt ° matiCa,,y ° r * ^/correct a £2£ Xaticaty 

L0012]Th,s mvention solves such a conventional problem, they are easy composition and a orocedure 

[0013] 

[Means for Solving the Problem]Then. in a clustering device which classifies input data according to 
th,s .nvenfon using a cluster. A cluster preparation device which creates a cluster, and a cTsterin* 
Performing dev.ce which performs clustering of input data using a cluster created l b a iZr 
preparation dev.ce. A clustering result monitoring instrument which identifies input data by which 

t ? Wh, f accu ™ ,ates by which false sorts were carried out is 

on th dat T fTT TV*"" nUmberS " St ° red in an -™™'*ion means, it constitutes based 
mm Jit 3 C P re P ar ^on device may create a new cluster. 

ao^ 0 bT^Sed Ua,ter ^ C ° rreCted COrreSp ° ndin * t0 a d ^ amic of input data, and fa.se 

[0015] 

[Embodiment of the Invention]Hereafter, an embodiment of the invention is described using a drawing 
This invent.cn ,s not limited to these embodiments at all. and can be earned out in the mode which 
becomes various in the range which does not deviate from the gist 

[001 6](A 1st embodiment) The clustering device of a 1 at embodiment is provided with the following 

Prototype data DB4 which manages prototype data as shown in drawing 1 . 

The cluster preparation device 1 which creates the cluster 5 using prototype data 

The clustering performing device 2 which clusters the input data 6 using the created cluster 5 

False-sorts input data D 88 which manages the data judged to be false sorts with the clustering result 

monitoring instrument 3 which supervises the clustering result 7 of the clustering performing device 2 

and the clustering result monitoring instrument 3. 

[0017]ln this device, the cluster preparation device 1 generates the cluster 5 using prototype data 
DB4. The clustering performing device 2 clusters the inputted input data 6 using the generated cluster 
5. and outputs the clustering result 7. The clustering result monitoring instrument 3 supervises the 
outputted clustering result 7. When it judges that the error included in the clustering result 7 of the 

T V. * , ' S 3 Va ' Ue bey ° nd 9 ° ertain ° 0nStant Value ' and is fa,se sorts clear 'y. the input data 6 is 
added to false-sorts input data DB8. and the number of data collected on false-sorts input data DB8 is 
counted. When fixed numbers with the data in this false-sorts input data DBS are exceeded this 
false-sorts input data DBS is used for the cluster preparation device 1, and it directs to create a 
Giustcr 

[0018]Operation of each device is explained in more detail. First, the cluster preparation device 1 
operates, when the cluster 5 is not created, and when creation of a cluster is directed from the 
clustering result monitoring instrument 3. 

[0019]When the cluster 5 is not created, it considers that each data of the prototype data constellation 

ZTtTZ** at \ DB4 J^ L an : nitia ' dUSter ' and 3 C ' USter With the "-est distance looked Z Tt 
of it Th, s distance ,s found by the formula 1 of drawing, 5. Two clusters called for at this time are 

umfied. and ,t is considered as a new cluster. The cluster which deleted the cluster integrated and was 

newly made has all the va.ues of the cluster deleted by integration. A series of work of boking for a 



cluster with the nearest distance and unifying them from the prototype data DB in a similar manner 
again .s repeated. Work is ended, when the total cluster number was set to 1 at this time or when the 
distance of clusters with the nearest distance is larger than a certain constant value 

E M°T di u nS *Z thiS W ° rk ° f 8 Seri6S ° f ' dUster 5 created from the Prototype data 4 is created. 
[0021 ]Next, when directtons of cluster creation are received from the clustering result monitoring 
instrument 3. using false-sorts input data DB8, it is the same operation as creating the cluster 5 and a 
cluster is created. At this time, by the created cluster, the number of the values contained in a cluster 
makes the thing more than fixed a new cluster, and it adds to the cluster 5. Finally, the false-sorts 
input data DB is cleared. 

[0022]Next operation of the clustering performing device 2 is explained. The inputted input data 6 and 
the nearest cluster of distance are chosen using the cluster 5 created by the cluster preparation 
device 1. It asks for calculation of this distance by the formula 1 of drawing 5. At this time the 

Sin. G ' USter * he calcu,ated distance showi "g an e Tor are outputted as the clustering result 7. 
L0023]The error included in the clustering result 7 to which the clustering result monitoring instrument 
3 was outputted, Namely, when the calculated distance is a value beyond a certain constant value add 
the input data 6 to false-sorts input data DBS, and the number is counted, When fixed numbers with 
the data in this false-sorts input data DB8 are exceeded, this false-sorts input data DB8 is used for 
the cluster preparation device 1, and it directs to create a cluster. 

[0024]As mentioned above, in the clustering device of this embodiment, also during operation 
automatic creat.cn of a cluster is possible and a cluster can be automatically created corresponding to 
a dynamic change of input data. Therefore, generating of the false sorts resulting from a dynamic 
change of input data is suppressed promptly. In this device, since re-creation of a cluster is performed 
only using the data by which automatic collection was carried out as false-sorts data in process of 
clustering of live data, correction of a cluster can be made at little burden. 
[0025](A 2nd embodiment) The clustering device of a 2nd embodiment creates a cluster using a 
self-organization map (henceforth SOM). ' 
[0026]This device is provided with the following. 

Th g. data input means 11 as which prototype data DB4 , c luster preparation device 1. cjusterine 
performing device 2, clustering result monitoring instr u ment 3. and false-sort^ inr, u t data DBS k 
comprised and the cluster preparation device 1 inputs pro t otype data lik» a 1st emhnriinw a * sho Wn 
in drawmp 9 ' — 1 

The SOM preparing means 12 which creates SOM9. 

The clustering result monitor means 31 which is equipped with the cluster creating means 13 which 
generates a cluster using SOM9 and in which the clustering result monitoring instrument 3 supervises 
the clustering result 7 of the clustering performing device 2. 

The SOM correcting means 32 which creates SOM 10 using the data of false-sorts input data DB8. 

[0027]ln this device, the data input means 1 1 of the cluster preparation device 1 inputs data from 
prototype data DB4, the SOM preparing means 12 creates SOM9 using this data, and the cluster 
creatmg means 13 generates the cluster 5 using S0M9. The clustering performing device 2 clusters 
the input data 6 mputted using the generated cluster 5, and outputs the clustering result 7 When it 
judges that the error included in the outputted clustering result 7 is a value beyond a certain constant 
value, and is false sorts clearly, the clustering result monitor means 31 of the clustering result 

JESSE* in ! tmment u addS hpUt data 6 t0 false - s ° rts 'nP^ data DBS, and counts the number. 
L0028]When fixed numbers with the data in the false-sorts input data DB8 are exceeded the SOM 
correcting means 32 creates SOM10 [ new as an input ] for the data of false-sorts input data DBS and 
directs the cluster creation which used SOMIOfor the cluster preparing means 13. In response the 
cluster preparing means 13 is added to the cluster 5 which creates a duster using SOM 10 and has 
already been created. 

[0029]Next, it explains in more detail about operation of each part. First, operation of the SOM 
preparing means 12 is explained. 

Si 5 !° M ,S l h ° Wn ^"y, 4 ' * iS f0mied from the neu ™ e 402 ^ranged on two dimensions 
SSimJ InT 8 V6Ct0r ° f the Same dimensi ™ as the input called the reference vector 403 
L0031]The SOM prepanng means 12 creates SOM in the procedure shown in the flow chart of drawing 



3. 

Step A1: Set the learning frequency T to 0, create the neurone arranged on two dimensions like step 
eSSSi S ' Ve referenCe VeCt0r of the same ^^sion as an input by wdom numbers tQ 

ESS?* A? 5 '! ran . dom u from P^ctype data DB4, and the data input means 1 1 takes out one data. 
L0033JStep A4: Determtne the neurone C with the reference vector which fills the formula (2) of 
drawing 5 t o this data. 

r0034]Step A5: Update the reference vector of the neurone located near the neurone C according to 
trie formula {3) of drawing 5 . 

end* step A8of *' nUmber ^ ^ ^ frequency T specified is reach ed, do the 

[0036]In Step A6, when the learning frequency T has not reached the number of times of regulation 
r™V? St6P A7:leamin S frequency T is increased, and it returns to Step A2 

L0037]Next the cluster creating means 13 operates, when the cluster 5 is not created, and when 
rnn^° nS creation of 8 cluster are received from the SOM correcting! means 32 

S loMQ St ' WHen th L°lu Ster , 5 iS n0t Creat6d ' the ClUSter 5 is created usin * To each neurone 

ot SOM9, neurone with the reference vector which fills the formula (4) of drawing 5 is chosen and it is 

Z nS f*l * ™ J? ! e ' eCted J neUr ° ne iS 9n initial ° IUSter - A cluster with the nearest d i^nce is looked 
for out of lt Th.s d.stance is found by the formula (1) of drawing 5 Two clusters called for at this time 
are un,fied. and it is considered as a new cluster. The cluster which deleted the cluster integrated and 

ZnTJr + S , ^r 1 ^ 3 ,° f *" ° ,USter ddeted bV integration - The ° ,uster whose distance is 
again the nearest ,s looked for s.m.larly, and a series of work of unifying them is repeated. Work is 

ended when the total cluster number was set to 1 at this time, or when the distance of clusters with 
the nearest distance is larger than a certain constant value. 

t0039] W hen transfer of creation of a cluster is received from the SOM correcting means 32, similarly a 
duster is created using SOM1 0 and it adds to the cluster 5. 

[0040]Like a 1st embodiment, the clustering performing device 2 clusters the input data 6 using the 
cluster 5, and cutouts the clustering result 7. When it judges that the error included in the outputted 
clustenng result 7 is a value beyond a certain constant value, and is false sorts clearly, the clustering 

[llflZTLT/ nS h addS iM inP , Ut 6 t0 falSe ~ SOrtS hpUt data DB8 ' and counts the number 
L0041 jWhen fixed numbers with the data in this false-sorts input data DB8 are exceeded the SOM 

3^ Wans 3 f creates sma " S ™10 with a size of a map equal to the number of the length of 
SOM9 or horizontal neurone by considering the data of false-sorts input data DBS as an input 
accord.ng to the flow chart of drawing 3 . And false-sorts input data DBS is cleared and creation of a 

°!i Vn^n , *1 C ' USter preparin « means I 3 - The s^ter preparing means 13 creates a cluster 
using SOM10, and adds it to the created cluster 5 so that it may mention above 
L0042]As mentioned above since it is clustering in the clustering device of this embodiment using 
SOM since very small SOM ,s used when the existing SOM can be applied as it is and a cluster is still 
more newly created processing speed is also high, and that practical effect is large. Since that by 
which i automatic collection was earned out as false-sorts data in process of clustering of live date is 

this VZ dus'ter" C ' USter ' * reSP ° nd t0 3 dynamiC ° hange ° f inpUt data by creation of 

[0043] 

[Effect of the InventionjThe clustering device of this invention can create a new cluster promptly 
corresponding to a dynamic change of input date, and can suppress generating of the false sorts 
ZZaT*- 3 Vn ° ° hange ° f inpL)t data so "leariy from the above explanation 
S^St"? C ? at, °" 0f tWS n€W ° IUSter is performe d only using the data by which automatic collection 
and toey end " ^ ^ " *™ are few those c - ati ™ burdenT 

Swfl In rt a t d t t iCe 7 ith + a meanS « ° reate 3 ° fUSter direct,y from the data by which false sorts were 
earned out, the advantageous effect that it is possible to create a cluster automatically also during 

rooTfilf Tr and ,' t ° an r6SP0nd t0 8 dynamiC ° hange of input data ^'oWy is acquired 

L0046Jln the device clustered using SOM, since very small SOM is used when the existing SOM can be 

applied as ,t ,s and a cluster is newly created, the effective effect that processing speec Ms also lS fs 



acquired. 

[0047]By this, input data can apply this invention to the device which clusters what changes in time 
and an effect can be demonstrated. For example, it is very effective when it use* fo th XJEJZr 

teivl T° ° g6S H tim6 ' the dustering device which investigates the palaibZ o the 
televiewer who accesses the homepage of the Internet, etc. 
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[Brief Description of the Drawings] ~ ~~ " " ~ 

b ,oc diagram 5h0Wing „. comoosition ef ^ obsterine ^ ^ ^ i s( ^ ^ 

Ssss:** di , er . m ^ *. oompMffiOT ^ clustering device h a 2nd embodiment 

[Drawing 5lThe figure showing expression 

feS"N'X:r ins sn examo18 rf * e —°> —*« 

1 Cluster preparation device 

2 Clustering performing device 

3 Clustering result monitoring instrument 

4 Prototype data DB 

5 Cluster 

6 Input data 

7 Clustering result 

8 False-sorts Input data DB 
9, 10SOM 

1 1 Data input means 

12 SOM preparing means 

13 Cluster creating means 

31 Clustering result monitor means 

32 SOM correcting means 

100 Prototype data constellation 

102 Cluster A 

1 03 Cluster B 

104 Distance 

105 Cluster C 

200 The cluster result created from prototype data 

201 CfusterY 

202 Cluster Z 

300 A clustering device using a cluster 

301 Clustery 

302 Cluster Z 

303 Input X 

304 The point of the input X 

401 SOM 

402 Neurone 

403 Reference vector 
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